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Abstract 

Artificial neurons with arbitrarily complex internal structure are 
introduced. The neurons can be described in terms of a set of internal 
variables, a set activation functions which describe the time evolution 
of these variables and a set of characteristic functions which control 
how the neurons interact with one another. The information capacity 
of attractor networks composed of these generalized neurons is shown 
to reach the maximum allowed bound. A simple example taken from 
the domain of pattern recognition demonstrates the increased com- 
putational power of these neurons. Furthermore, a specific class of 
generalized neurons gives rise to a simple transformation relating at- 
tractor networks of generalized neurons to standard three layer feed- 
forward networks. Given this correspondence, we conjecture that the 
maximum information capacity of a three layer feed-forward network 
is 2 bits per weight. 
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1 Introduction 



The typical artificial neuron used in neural network research today has its 
roots in the McCuUoch-Pitts [jl5| neuron. It has a simple internal structure 
consisting of a single variable, representing the neuron's state, a set of weights 
representing the the input connections from other neurons and an activation 
function, which changes the neuron's state. Typically, the activation function 
depends upon a sum of the product of the weights with the state variable 
of the connecting neurons and has a sigmoidal shape, although Gaussian 
and Mexican Hat functions have also been used. In other words, standard 
artificial neurons implement a simplified version of the sum- and- fire neuron 
introduced by Cajal in the last century. 

Contrast this for a moment to the situation in biological systems, where 
the functional relationship between the neuron spiking rate and the mem- 
brane membrane potential is not so simple, depending as it does on a host 
of neuron specific parameters Furthermore, even the notion of a typical 
neuron is suspect, since mammalian brains consists of many different neu- 
ron types, many of whose functional role in cognitive processing is not well 
understood. 

In spite of these counter examples from biology, the standard neuron has 
provided a very powerful framework for studying information processing in 
artificial neural networks. Indeed, given the success of current models such 



as those of Little-Hopfield [O, R], Kohonen O] or Rumelhart, Hinton and 



Williams pOf, it might be questioned whether or not the internal complexity 



of the neuron plays any significant role in information processing. In other 
words, is there any pressing reason to go beyond the simple McCuUoch-Pitts 
neuron? 

This paper examines this question by considering neurons of arbitrary 
internal complexity. Previous researchers have attempted to study the affects 
of increasing neuron complexity by adding biologically relevant parameters, 
such as a refraction period or time delays, to the neuro-dynamics (see e.g. 
Clark et al., 1985). The problem with such investigations is that they have 
so far failed to answer the question of whether such parameters are simply an 
artifact of the biological nature of the neuron or whether the parameters are 
really needed for higher-order information processing. To date networks with 
more realistic neurons look more biologically plausible, but their processing 
power is not better than simpler networks. An additional problem with 
such studies, is that as more and more parameters are added to the neuro- 
dynamics, software implementations becomes too slow to allow one to work 
with large, realistically sized networks. Although using silicon neurons |TB] 



can solve the computational problem, they introduce their own set of artifacts 
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which may add or detract from their processing power. 

The approach taken here differs from earher work by extending the neuron 
while keeping the neuro-dynamics simple and tractable. In doing so, we will 
be able to generalize the notion of the neuron as a processing unit, thereby 
moving beyond the biological neuron to include a wider variety of information 
processing units. (One has to keep in mind, that the ultimate goal of the 
artificial neural network program is not to simply replicate the human brain, 
but to uncover the general principles of cognitive processing, so as to perform 
it more efficiently than humans are capable of.) As a byproduct of this 
approach, we will demonstrate a formal correspondence between attractor 
networks composed of generalized artificial neurons and the common three 
layer feed-forward network. 

The paper is organized as follows: In the next section, the concept of 
the generalized artificial neuron is introduced and its usefulness in attractor 
neural networks is demonstrated, whereby the information capacity of such 
networks is calculated. Section three presents a simple numerical compari- 
son between networks of generalized artificial neurons and the conventional 
multi-state Hopfield model. Section four discusses various forms that the 
generalized artificial neuron can take and the meaning to be attached to 
them. Section five discusses generalized generalized neurons with interacting 
variables. The paper ends with a discussion on the merits of the present 
approach. Proofs and derivations are relegated to the appendix. 



2 Generalized Artificial Neurons (GAN) 

Since its introduction in 1943 by McCulloch and Pitts, the artificial neuron 
with a single internal variable (hereafter referred to as the McCulloch-Pitts 
neuron) has been a standard component of artificial neural networks. The 
neuron's internal variable may take on only two values, as in the original Mc- 
Culloch and Pitts model, or it may take on a continuum of values. Although, 
even where analog or continuous neurons are used, it is usually a matter of 
expediency, e.g., learning algorithms such as back-propagation pO[ require 



continuous variables even if the application only makes use of a two state 
representation. 

Whereas the McCulloch-Pitts neuron presupposes that a single variable 
is sufficient to describe the internal state of a neuron, we will generalize this 
notion by allowing neurons with multiple internal variables. In particular, 
we will describe the internal state of a neuron by Q variables. 

Just as biological neurons have no knowledge of the internal states of 
other neurons, but only exchange electro-chemical signals (Shepherd, 1983), 
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a generalized artificial neuron (GAN) should not be allowed knowledge of the 
internal states of any other GAN. Instead, each GAN has a set of, C, charac- 
teristic functions, f = {/j : — > -R, = 1, • • • , C}, which provide mappings 
of the internal variables onto the reals. It is these characteristic functions 
which are accessible by other GANs. Even though the characteristic func- 
tions may superficially resemble the neuron firing rate, no such interpretation 
need be imposed upon them. 

As in the case of McCuUoch-Pitts neurons, the time evolution of the 
internal variables of a GAN are described by a deterministic dynamics. Here 
we distinguish between the different dynamics of the Q internal variables by 
defining Q activation functions, Ai. These activation functions may depend 
only upon the values returned by the characteristic functions of the other 
neurons. 

A GAN, A/'(Q,f, A), is thus described by a a set of internal variables, 
Q, a set of activation functions. A, and set of characteristic functions, f. 
Note, for the case of McCuUoch-Pitts neuron, there is only a single internal 
variable governed by single activation function taking on one of two values: 
or 1, which also doubles as the characteristic function. 

Now, to combine these neurons together into a network, we must define 
a network topology. The topology is usually described by a set of numbers, 
{H^ij} [i^i = 1,...,N), called variously by the names couplings, weights, 
connections or synapses, which define the edges of a graph having the neu- 
rons sitting on the nodes. (In this paper we will use the term "weight" to 
denote these numbers.) Obviously, many different network topologies are de- 
finable, each possessing its own properties, therefore, in order to make some 
precise statements, let us consider a specific topology, namely that of a fully 
connected attractor network |]14[ §]. Attr actor networks form a useful start- 
ing point because they are mathematically tractable and there is a wealth of 
information already known about them. 

2.1 Attractor Networks 

For simplicity, consider the case where each of the Q internal variables is 
described by a single bit, then the most important quantity of interest is the 
information capacity per weight, £, defined as: 

^ _ Number of bits stored 
Number of weights 

For a GAN network the number of weights can not simply be the number 
of {VFjj}, otherwise it would be difficult for each internal variable to evolve 
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independently. The simplest extension of the standard topology is to allow 
each internal variable to multiply the weights it uses by an independent fac- 
tor. Hence, instead of {W^ij} we effectively have {VFj"}, where, a = 1, . . . ,Q. 
A schematic of this type of neuron is given in Figure ^ In an attractor net- 
work, the goal is to store P patterns such that the network functions as an 
auto-associative, or error- correcting memory. The information capacity, S, 
for these types of networks is then: 



^ = TTT?^ bpw. 



QPN 

^ bpw, (2) 



(bpc = bits per weight) As is well known, there is a fundamental limit on the 
information capacity for attractor networks, namely S <2 bpw ^, 13, l7 . 
This implies, P < 2N. 

Can this limit be reached with a GAN? To answer this question, consider 
the case where the activation functions are simply Heaviside functions, H: 



N 



(3) 



where H{x) = if a; < and H{x) = 1 if x > 0. s^{t + 1) represents the the 
a-th internal variable of the i-th neuron. The weight to the internal states of 
the i-th neuron does not violate the principle stated above, because the i-th 
neuron still has no knowledge of the internal states of the other neurons and 
each neuron is free to adjust its own internal state as it sees fit. 

In appendix A we use Gardner's weight space approach to calculate 
the information capacity for a network defined by Eq. ^ where we now take 
into account the fact that the total number of weights has increased from A^^ 
to QN"^. Let p denote the probability that = and 1 — p the probability 
that s"" = 1, then S for Eq. ^becomes: 



S 



-plnsp- (1 - p) ln2(l - p) 



l-p + i(2p- l)erfc(a;/V2) 
where X IS cL solution to the following equation: 



(2p - 1) 



- f erfc(x/v^) 



bpw. 



(4) 



(5) 
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anderfc(2;) is the complimentary error function: erfc(z) = {2/^) dye~y . 

When p = 1/2, i.e., when s has equal probability of being or 1, 
then X = and the information capacity reaches its maximum bound of 
S = 2 bpw. For highly correlated patterns, e.g., p — > 1, the information 
capacity decreases somewhat, S — >• 1/(2 In 2) bpw, but, more importantly, it 
is still independent of Q. 

What we have shown is that networks of GANs store information as 
efficiently as networks of McCulloch-Pitts neurons. The difference being, 
that in the former, each stored pattern contains NQ bits of information 
instead of N. Note: we have neglected the number of bits needed to describe 
the characteristic functions since they are proportional to QN, which for 
large N is much smaller than the number of weights, QN^ . 



3 A Simple Example 

Before continuing with our theoretical analysis, let us consider a simple, 
concrete example of a GAN network that illustrates their advantages over 
conventional neural networks. Again, we consider an attractor network com- 
posed of GANs. Each GAN has two internal bit- variables Q = {si, S2} 
whose activation functions are given by Eq. ^ and two characteristic func- 
tions, f = {g^h}. Let g = qi® q2 and h = qi + 2g2- In the neurodynamics 
defined by Eq. |^ we will use the function g, reserving the function h for 
communication outside of the network. (There is no reason why I/O nodes 
should use the same characteristic functions as compute nodes.) 

The weights will be fixed using a generalized Hebbian rule |^, §], i.e., 

= f (6) 

Since this GAN has 4 distinct internal states, we can compare the perfor- 
mance of our GAN network to that of a multi-state Hopfield model ||19| . De- 
fine the neuron values in the multi-state Hopfield network as s G {—3, —1, 1, 3} 
and define thresholds at {—2,0,2}. (For a detailed discussion regarding 
the simulation of multi-state Hopfield models see the work of Stiefvater and 
Miiller [||].) 



Fig. 1^ depicts the basins of attraction for these two different networks, i.e., 
do is the initial distance from a given pattern to a randomly chosen starting 
configuration and < dj > is the average distance to the same pattern when 
the network has reached a fixed point. For both network types, random 
sets of patterns were used with each set consisting of P = 0.05A^ patterns. 
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The averaging was done over all patterns in a given set and over 100 sets of 
patterns. 

There are two immediate differences between the behavior of the multi- 
state Hopfield network and the present network: 1) the recall behavior is 
much better for the network of GANs, and 2) using the XOR function as 
a characteristic function when there are an even number of bit variables, 
results in a mapping between a given state and its anti-state (i.e., the state 
in which all bits are reversed), for this reason the basins of attraction have 
a hat-like shape instead of the sigmoidal shape usually seen in the Hopfield 
model. 

This simple example illustrates the difference between networks of con- 
ventional neurons and networks of GANs. Not only is the retrieval quality 
improved, but, depending upon the characteristic function, there is also a 
qualitative difference in the shape of the basins of attraction. 

4 Characteristic Functions 

Until now the definition of the characteristic functions, f , has been deliber- 
ately left open in order to allow us to consider any set of functions which 
map the internal variables onto the reals: f = {/ : R'^ R}. In section ^no 
restrictions on the / were given, however, an examination of the derivation 
in appendix A, reveals that the characteristic functions do need to satisfy 
some mild conditions before Eq. ^ holds: 



The first two conditions are automatically satisfied if / is a so-called squash- 
ing function, i.e, / : i?*^ — » [0, 1]. 

4.1 Linear / and Three Layer Feed-Forward Networks 

One of the simplest forms for / is a simple linear combination of the internal 
variables. Let the internal variables, be bounded to the unit interval, 

i.e., s° G [0,1], and let denote the coefficients associated with the i-th. 
neuron's a-th internal variable, then / becomes: 



1) 
2) 
3) 



I (/) l« v^, 
{P) « iV, 
{P) - (/)' ^ 0. 



and 



(7) 
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a=l 

Provided, | Y.a=iJi \^ aiid provided not all are zero, the three 

conditions in Eq. |^ will be satisfied. Since the internal variables are bounded 
to the unit interval, let their respective activation functions be any sigmoidal 
function, S. Then we can substitute S into Eq. ^ in order to obtain a time 
evolution equation solely in terms of the characteristic functions: 

Q ( N \ 

m) = Y.Jts\Y.wtjAt-m- (9) 

a=l j 

Formally, this equation is, for a given equivalent to that of a three layer 
neural network with — 1 linear neurons on the input layer, Q sigmoidal 
neurons in the hidden layer and one linear neuron on the output layer. From 
the work of Leshno et al.Q, we know that three layer networks of this form 
are sufficient to approximate any continuous function F : R^^^ R to any 
degree of accuracy provided Q is large enough. Leshno et al.'s result applied 
to Eq. ^ shows that at each time step, a network of GANs is capable 
of approximating any continuous function F : i?^ —>■ i?^ to any degree of 
accuracy. 

In section the information capacity of a GAN attractor network was 
shown to be given by the solution of eqs. § and ^. Given the formal corre- 
spondence demonstrated above, the information capacity of a conventional 
three layer neural network must be governed by the same set of equations. 
Hence, the maximum information capacity in a conventional three layer net- 
work is limited to 2 bits per weight. 



4.2 Correlation and Grandmother functions 

A special case of the linear weighted sum discussed above is presented by the 
correlation function: 

Mt) = ^j:ttsm, (10) 

*^ a=l 

where the {tf} represent a specific configuration of the internal states of 
^{Qi /)• With this form for /, the GANs can represent symbols using the 
following interpretation for /: as / — > 1, the symbol is present, and as 

^Leshno et al.'s proof is the most general in a series of such proofs. For earlier, more 
restrictive results see e.g., |l^, ^ 
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/ ^ the symbol is not present. Intermediate values represent the symbols 
partial presence as in fuzzy logic approaches. In this scheme, a symbol is 
represented locally, but the information about its presence in a particular 
pattern is distributed. Unlike other representational schemes, by increasing 
the number of internal states, a symbol can be represented by itself. Consider, 
for example, a pattern recognition system. If Q is large enough, one could 
represent the symbol for a tree by using the neuron firing pattern for a tree. 
In this way, the symbol representing a pattern is the pattern itself. 
Another example for / in the same vein as Eq. [1^ is given by: 

fi{t) = S{s-(t)},{t-}, (11) 

where, Sx,y is the Kronecker delta function: Sx,y = 1 iS x = y. This equation 
states that / is one when the value of all internal variables are equal to their 
values in some predefined configuration. A GAN of this type represents what 
is sometimes called a grandmother cell. 



4.3 Other Forms of / 

Obviously, there are an infinite number of functions one could use for /, some 
of which can take us beyond conventional neurons and networks, to a more 
general view of computation in neural network like settings. Return for a 
moment to the example discussed in section ^: 

Mt) = <^sm- (12) 

a=l 

This simply implements the parity function over all internal variables. Its 
easy to see that (/) = 1/2 and {p) — (f)^ = 1/4, hence, this form of / fulfills 
all the necessary conditions. Using the XOR function as a characteristic 
function for a GAN trivially solves Minsky and Papert's objection to neural 
networks |jT8| at the expense of using a more complicated neuron. 

Of course Eq. |12] can be generalized to represent any Boolean function. In 
fact, each /j could be a different Boolean function, in which case the network 



would resemble the Kauffman model for genomic systems [O], a model whose 



chaotic behavior and self-organizational properties have been well studied. 



5 Neurons with Interacting Variables 

So far we have considered only the case where the internal variables of the 
GAN are coupled to the characteristic function of other neurons and not 
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to each other, however, in principle, there is no reason why the internal 
variables should not interact. For simplicity consider once again the case of 
an attractor network. The easiest method for including the internal variables 
in the dynamics is to expand Eq. |^ by adding a new set of weights, denoted 
by, {L"^''}, which couple the internal variables to each other: 

/ N Q \ 

slit + 1) = ^ E + E • (13) 

Using the same technique we use in section we can determine the new 
information capacity for attractor networks (see appendix A): 





1 1 \ / p(i-p) 


2 


(i + A)(i + A,;»:5,.) 



(14) 



where, is given by Eq. H, A = QjN and (0) is the average value of the 
characteristic function at the fixed points. From this equation we see that if 
the fluctuations in the characteristic functions are equal to the fluctuations 
in the internal variables, then £ = otherwise, £ is always less than £q. 



6 Summary and Discussion 

In summary, we have introduced the concept of the generalized artificial 
neuron (GAN), A/'(Q, f , A), where Q is a set of internal variables, f is a set 
characteristic functions acting on those variables and A is a set of activation 
functions describing the dynamical evolution of those same variables. We 
then showed that the information capacity of attractor networks composed 
of such neurons reaches the maximum allowed value of 2 bits per weight. If 
we use a linear characteristic function a la Eq. ||, then we find a relationship 
between three layer feed forward networks and attractor networks of GANs. 
This relationship tells us that attractor networks of GANs can evaluate an 
arbitrary function of the form F : i?'^ at each time step. Hence, their 

computational power is significantly greater than that of attractor networks 
with two state neurons. 

As an example of the increased computation power of the GAN, we pre- 
sented a simple attractor network composed of four state neurons. The 
present network significantly out performed a comparable multi-state Hop- 
field model. Not only were the quantitative retrieval properties better, but 
the qualitative features of the basins of attraction were also fundamentally 
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different. It is tliis promise of obtaining qualitative improvements over stan- 
dard models tliat most sets tlie GAN approacli apart from previous work. 

In section |2.1| , the upper limit on the information capacity of an attractor 
network composed of GANs was shown to be 2 bits per weight, while, in sec- 
tion LI we demonstrated a formal correspondence between these networks 
and conventional three layer feed-forward networks. Evidently, the informa- 
tion capacity results apply to the more conventional feed-forward network as 
well. 

The network model presented here bears some resemblance to models 
involving hidden (or latent) variables (see e.g., 0]), however, there is one 
important difference: namely, the hidden variables in other models are only 
hidden in the sense that they are isolated from the network's inputs and 
outputs; but they are not isolated from each other, they are allowed full 
participation in the dynamics, including direct interactions with one another. 
In our model, the internal neural variables interact only indirectly via the 
neurons' characteristic functions. 

Very recently, Gelenbe and Fourneau proposed a related approach they 
call the "Multiple Class Random Neural Network Model" . Their model also 
includes neurons with multiple internal variables, however, they do not dis- 
tinguish between activation and characteristic functions, furthermore, they 
restrict the form of the activation function to be a stochastic variation of 
the usual sum-and-fire rule, hence, their model is not as general as the one 
presented here. 

In conclusion, the approach advocated here can be used to exceed the 
limitations imposed by the McCulloch-Pitts neuron. By increasing the in- 
ternal complexity we have been able to increase the computational power 
of the neuron, while at the same time avoiding any unnecessary increase in 
the complexity of the neuro dynamics, hence, there should be no intrinsic 
limitations to implementing our generalized artificial neurons. 
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Figure 1: A schematic of a generalized artificial neuron, fi denotes the value 
of the i-th neuron's characteristic function, these are the values communi- 
cated to other neurons in the netwodlfi /j and Oj denote input and output 
values used for connections external to the network. 




<*p> 

Figure 2: Basins of attraction for a GAN network (lower curves) and for 
a multi-state Hopfield model (upper curve). In both cases the number of 
stored patterns is P = 0.05A'". In edfli case two different system sizes are 
shown, one with N — 100 neurons and one with N — 400 neurons. 



A Derivation of the Information Capacity 



For simplicity consider a homogeneous network of GANs, where the Q 
internal variables of each neuron are simply bit-variables. In addition, we 
will consider the general case of interacting bits. Given P patterns, with 0f 
representing the characteristic functions and o""^ the internal bit-variables, 
then by equation eqs. ^ and |T^, we see that these patterns will be fixed 
points if: 



N 



(15) 



In fact, the more positive the left hand side is, the more stable the fixed 
points. Using this equation we can write the total volume of weight space 
available to the network for storing P patterns as: 



(16) 



where. 



'^t = ^al \{dW^, \{dLt 5 (E(W^0' - n] 6 (E(Lf )^ - Q 



X 



1 JV IV 

- 1^ I ^ E + ^ E - 



(17) 



and 



Zt= I WdW^^ WdLf 6 (j:{Wtjf - n] S (E(Lf )2 - Q 

j b J \bjta 



where k is a constant whose purpose is the make the left hand side of Eq. ^ as 
large as possible. (Note, although we have introduce a threshold parameter, 
91, we will show that thresholds do not affect the results.) 

The basic idea behind the weight space approach is that the subvolume, 
Vf", will vanish for all values of P greater than some critical value, Pc- In 
order to find the average value of Pc, we need to average Eq. |17| over all 
configurations of al'^. Unfortunately, the a^^ represent a quenched average, 
which means that we have to average the intensive quantities derivable from 
V instead of averaging over V directly. The simplest intensive such quantity 
is: 
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= lim — 

n-.0 

The technique for performing the averages in the hmit n — > is known as 
the rephca method [Q. 

By introducing integral representations for the Heaviside functions 
( H[z — k) = dx dy exp{iyx) ) we can perform the averages over 
the cr"'^: 



— ■ (19) 



1 p poo poo 



X 



3 

( n,P 



N 



I A=i L ViV 



Q 



X 



n N \ ( Q 

nME(w^-^r-^ ME(Lf)2-Q 

A=l / 



(20) 



First sum over the where j ^ 



V. 



n,P 



N 



Eexp<|-zEl/^(2ar-l 
{2ar - 1) 



hii 
1 ■ 
3 



IlE^^p 

n 



^^^^^^^ 



1 -i 



7 V^^W-^W-^ 

9 /\/- yfi^/i " ij " ij 

'^^^ AB 



exp ^ E < E - E E E ^vt^^yf 



:2r 



now sum over the where j = i but h ^ a: 
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n,P ( A ^ 

[I — 1 



abA^bfi 



viV Afj, b M 6 



where we have use p as the probabihty that a = 0, (0) = l]o-0(c") and 
(0^) = So- If we insert Eq. |21| into ^ and define the following 



quantities: q^^ = {l/N)j:jW{^^W,f and r"^^ = {I / Q) Y.b Lf^ Lf^ ^oi 
all A < 5 and Mf^ = {l/^^N) W^^ and T^"^ = (l/VQ) EbLf^ for all 



A, then Eq. ^ can be rewritten as: 



(23) 

where, 

G = aGi {q, M, T) + G2 (F, z, E) + AG2 (f/, if, C) + z E F^^'q^'' + 

^ H^^r^^ + -Y^z'' + -Y^U^ + 0{l/^). (24) 

a = P/N and we have introduced another parameter: A = Q/N. The 
functions Gi and G2 are defined as: 



G, ^ n^^^y_nrf2/^^p|^E 



I Y: yt{2ar - 1) {e'^ - {<P)M^ - v^(l - p)T^ 

A/i 

(0^)-(0)^ + Ap(l-p) ^^^^^2 _ 
E E1/.V - (0)^) +r^^Ap(l -p) 

A<B 
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= ln(/ Yldx^ lldy^expUY^y' 
\jk a j-oo a y. a 



+ 



: Y: /(2a - 1) - - Vx{i - p)T^) 

A 



A<B 



(25) 



and 



G2{x,y,s) = —In 



A<B 



In 



jA ^ A i 

/oo r 7 

n dW-^ cxp -- J2 y'^iW f - I E x^^WW^ - I J2 s'^w 



A<B 



The so-called replica symmetric solution is found by taking q 



AB 



„AB 



F^^ = F and H^^ = H for all A < B, and setting 



= U, = E, = C, = M and = T, for all A. In terms of 
replica symmetric variables, G2 has the form: 



(26) 



n ^ 1 nx ns^ 

G2[x,y,s) ^ --\n{iy -IX) - 

2 Zy — X ly — IX 



while Gi can be reduced to: 



Gi ny^°° |p In 7_ + (1 -p) In 7+ 1 +0(71^), 



+ 0(n2), 



(27) 

(28) 



(29) 



where, 



— cric I 

2 \J2[{l-q)m-m + {^-r)\p{l-p)], 



(30) 



20 



and we have set Vs = e~'^/^/V2TT, v = 9 - {(t))M - v^(l - p)T. erfc(z) is 
the comphmentary error function: erfc(2;) = dy . Since the 

integrand of Eq. ^ grows exponentially with N, we can evaluate the integral 
using steepest descent techniques. The saddle point equations which need to 
be satisfied are: 



dG _ _ _ dG 

dE~ ' dC~ ' 97" ' dU 

oq or 



dG dG 



(31) 
(32) 



Solving this set of equations yields a system of three equations which define 
g, r and v in terms of a and A. A little reflection reveals that when a = P/N 
approaches its critical value, etc = Pc/N, then g — 1 and r — >■ 1, hence, this 
limit will yields the critical information capacity. From Eq. |32| the following 
relationship between q and r as they both approach 1 can be deduced: 



- (0)^ 



\ P(l-P) ■ 

We can now write the information capacity per weight as: 



(33) 



£ = [-pln2p-(l-p)ln2(l-p)] 



QPN 



[-P Ina p - (1 - p) ln2(l - p)] - 



Or 



(34) 



with: 



a;'= p (K-V)- 



-{K-V)y2 ^ 



{l-p)l{K+V)- 



^ ^2 



1 + {K-Vf 



erfc 



2n ^2 



1 + {K + Vf 



erfc 



-K + V \\ 
V2 )] 

' -K-V^ 



V2 



X 



1 + A- 



P(l-P) 



/ 



1 + A. 



- (0)2 

where V is implicitly defined through: 



P(l-P) 



(35) 
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r^-{K-v)y2 K-V ^ 

PS j= 1 eric 

1 \/2^ 




I — p)< ^= \ — erfc 



-K -V' 



(36) 



and K = K,/ [(0^) — (0)^ + p(l — p)]. Note: For a given p, the maximum 
value of S occurs when K = 0. By setting K and A equal to zero, one 
recovers Eq. ^ and ^ in the text. (Its also interesting to note, that since 
V = [e- {(j))M - ^A{l - p)t] I [(02) _ (0)2 + p(i _ p)]^ M and T, which 
represent the average values of the inter and intra neuron weights respec- 
tively, are not uniquely determined, rather solving Eq. ^ for V only fixes 
the difference between T and M. Furthermore, the threshold Q can be easily 
absorbed into either M or T provided either (0) 7^ or p 7^ 1.) 

We arrived at equations ^ and |3^ using the saddle point conditions of 
Eq. ^ and As the reader can readily verify, these saddle point equations 
are also locally stable. Furthermore, since the volume of the space of allow- 
able weights is connected and tends to zero as g, r — 1, the locally stable 
solution we have found must be the unique solution Therefore, in this 
case, the replica symmetric solution is also the exact solution. 
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